Project
Title
Description
Hardware
Software
Electrical component themed AI detection and identification.
A mounted camera above a surface (part of the product)
Produces a controlled environment live feed for the application
An application running inference on a live USB camera feed (optionally imported
picture or video)
Application
Modification of the provided data to simulate differences in the environment that
would occur in a real life scenario, and to provide a challenge in form of
imperfections to both train against, and test against.
Augmentation examples
Addition of glare
Rotation
Blurring
Addition of spots
GUI
The Application is based on Qt Creator, using C++
Inference
Running on C++
Utilising
Summary
Detection of objects from an image via Inference
Detect and display boundaries for each identified class, and the confidence value of
this detection from the input image using Inference.
Identifcation
Post-processing of the components in the bounding boxes detected by inference,
which may have additional information that can be identified by a variety of
approaches.
Examples
LEDs
Resistors
Resistor code value
LED color
Technology
AI based Electrical Component Identifier
IC Components
Pin count
Information written on the component
Features
Inference
Classes present in the dataset that the model will be trained upon
Resistor
Diode
Capacitor
LEDs
Integrated Circuits
AC
DC
LDR
Milestones
Base camera rig
Initial inference model training
Inference running
Testing with video footage from a mobile device
Research
Models
Ultralytics YOLO
Live Labeling
Focus Audience
Set Rig
The set position of the camera, a significant reduction in distance between the
objects, and significant consistency of the lighting provided by the ring light, and the
static background - will boost the confidence of the inference considerably.
Training
Running
Post-processing
Rationale
Timeline
Gantt Chart
Live training
YOLOv5
Training
1st batch, test run
Image Count
100
Classes (1 Total)
Resistor
2nd batch
Image Count
Training
Evaluation
20
Training
Evaluation
1800
540
3rd batch
Image Count
Training
Evaluation
2393
724
Classes (9 Total)
red_led
green_led
blue_led
yellow_led
ac_capacitor
dc_capacitor
resistor
sip_resistor
pcb_terminal
Classes (10 Total)
red_led
green_led
blue_led
yellow_led
ac_capacitor
dc_capacitor
resistor
sip_resistor
pcb_terminal
metal_nut
Augmentation
Default
Augmentation
Default
Average time per epoch
34 seconds
Epoch count
400
Epoch count
250
Epoch count
300
Augmentation
Default
YOLOv8
Due to the angle and lighting both being known and mostly set thanks to using a set
rig, the input dataset does not need to cover angles and lighting outside what the rig
will expose it to during runtime.
The sum of all the points covered above results in a significant reduction in data
required to train when compared to a setup without a set rig, for equivelant
confidence values during runtime.
The angle range is reduced only to looking from top to down, eliminating the rest of
the angle range.
While the lighting will change depending on the room conditions, the ring light
around the camera will provide significant consistency in lighting.
While this does not not eliminate the necessity to train against various lighting
conditions, it does reduce their significance and increase certainty of the detection.
Only the components being detected need to be trained in all angles, as opposed to
the camera gathering the dataset requiring to be positioned in different angles.
Having a top to down view also eliminates the majority of issues that come with
glare from high luminosity bodies, such as clouds or the sun.
A set rig significantly limits the distance that the objects will be from the camera
during runtime, allowing for further confidence in the predictions.
Static background
Angle range
Lighting
Apart from dust or unexpected objects present on the rig's surface, which should be
removed before usage - the background that the objects are in front of will stay
mostly consistent.
This reduces the necessity to gather data of the same object under backgrounds
that are not expected to be used during runtime.
While this project may be retrained and refocused to be utilised for many different
fields - it is trained for electrical component identification, which is focused towards
engineers.
Architectures
This project focuses on both existing engineers, and ones that are interested in
becoming engineers.
Having access to the provided by the project quick identification of components,
count of each, and any potential additional information saves time spent manually
analysing this information.
Average time per epoch
2 minutes
Average time per epoch
2 minutes and 20 seconds
SIP Resistor
Singular
Acronyms
SIP
Introduction
Single Inline Package
GPU
Graphics Processing Unit
CPU
Central Processing Unit
AI
Artificial Intelligence
LDR
Light Dependant Resistor
LED
Light Emitting Diode
AC
DC
Alternating Current
Direct Current
PCB terminal
PCB
Printed Circuit Board
The most prominent color may be identified by sorting all the colors from the image
into their hue values, and checking which hue is most active.
The color codes can be identified by processing the image using filters and
otherwise until only the prominent colors remain.
These can be processed into the actual ohm value.
Then, the positions of the color codes relative to the body of the resistor can be used
to identify the specific positions and order of the color codes.
The pin count can be identified by processing the image using filters until there is a
clear contrast between the body of the chip, and the pins.
One approach that could help identify the number of pins would be drawing a line
between two of the pins and seeing how many of the pins touch this line. Taking the
line that touches the most pins would provide the pin count of this IC.
OCR may be used to extract the text based information.
OCR
Optical Character Recognition
Software based reading alphabetical characters from an image that contains written
text.
Input image
Inference method
Algorithm method
Different color LEDs may be trained as individual classes.
Has the disadvantage of requiring training for each individual LED separately, as
opposed to one generic LED.
Has the advantage of working on any LED.
Raw input
High contrast filter
Colors histogram
Approaches after filtering
Clearly prominent yellow
Has the disadvantage of potentially giving false information if the background is too
vibrant.
Contrast approach
HSV
Taking the average of all the pixels hue values that have a value above a certain
threshold. Around 0.7 on a range from 0 to 1 should be appropriate.
Hue is in the range of 0 to 360 degrees.
The pink dots represent the pixel values obtained from the previous step.
Taking the average of this data, the result would land in the degree value that can be
easily determined as yellow, by separating the hue circle into sections of colors by
degrees ranges.
HSV, or Hue Saturation Value, is an alternative way to represent colors.
It can be advantageous over RGB in situations such as this.
RGB
Red Green Blue
Commonly used to referred to a way of defining colors by their Red, Green and Blue
properties.
HSV
Hue Saturation Value
Commonly used to referred to a way of defining colors by their Hue, Saturation and
Value properties.
Yellow is between 72° and 108° degrees on the hue circle.
Note: This example would ignore colors that are darker than 0.7, on a range of 0 to
1.
The ability to take a snapshot of the current frame, defining appropriate labels, and
saving this labeled snapshot for future training. All from inside the GUI.
Alternatively, taking snapshots of the GUI and saving them for later labeling.
Sorted from highest priority, to lowest.
Setting up the camera on a rig.
Base GUI
GUI with essentials to interface the camera through USB, with
A live display from the camera on the rig.
Ability to take images by pressing a button.
Support for running Inference.
~100 images of a single class, taken from the rig for initial training and testing of the
model.
Initial dataset gathering
For the purpose of testing inference on the rig.
Proof of concept. The results will not be perfect as the dataset is minimal, and only
contains 1 class.
Further dataset gathering
At least 250 pictures of each class of every component that the project is designed
to detect.
Furter model training
This training will take considerably longer than the initial training. Around 2 minutes
per epoch, and should be ran for at least 300 epochs.
The initial training should not take long at all, and does not require to be polished.
Training for ~100 epochs should be sufficient, with each epoch taking ~20 seconds
on the machine available.
Rig
Model Training via Deep Learning
Machine used
Personal Computer
CPU
GPU
AMD RyzenTM 7 5800X3D
Core count
8
Base clock frequency
3.4GHz
L3 Cache
96MB
Maximum operating temperature
90°C
Thread count
16
GeForce RTX 3060 Ti
Memory
8192MB
CUDA core count
4864
Capacity
Type
GDDR6X
Base clock frequency
1.41GHz
The goal is to reach 0.8 from range of 0 to 1 confidence values.
Ability to gather further information from the detection bounding boxes provided by
the inference.
After the previous steps are in good shape, investigation of moving the inference to
a mobile device will begin.
If the confidence values are not up to standard, more data will be gathered from this
and potentially other mobile devices, and further training will follow, until the results
are adequate.
If adequate results are achieved before the deadline of this project, deployment to a
mobile device will be started.
If the frame rates are not sufficient enough, the inference may be ran on still images
to improve user experience.
Optional: Ability to label the images from the device, without requiring external
software.
It may be advantageous given the timeframe of the project to instead gather data
during a session and labelling it afterwards.
Memory
Capacity
Type
2x16GB
DDR4
Frequency
3.6GHz
Brand
Corsair
Name
Vengeance RGB PRO SL
Link
https://www.corsair.com/eu/en/Categories/Products/Memory/Vengeance-RGB-PRO-
SL-Black/p/CMH32GX4M2E3200C16
Brand
AMD
Name
Ryzen 7 58700X3D
Link
https://www.amd.com/en/products/cpu/amd-ryzen-7-5800x3d
Brand
NVIDIA
Series
30
Name
RTX 3060Ti
Link
https://www.nvidia.com/en-gb/geforce/graphics-cards/30-series/rtx-3060-3060ti/
CUDA
Special cores that are designed for compute-intensive tasks.
These run parallel with the CPU, and may also run parallel with multiple GPUs.
They are perfect for deep learning, as deep learning is incredibly compute intensive.
Deep learning training times are predictable, and stay mostly constant between
epochs.
This means that there are no race conditions, and the more processing power
available, the quicker the epoch will finish.
Each of these steps should be polished before continuing to the next one, to provide
a solid foundation for the next step to be based on.
Analysis of the models
Brief History
YOLO, which stands for You Only Look Once is a popular image segmentation and
object detection model that was originally developed by Joseph Redmon and Ali
Farhadi.
The first version was released in 2015, and it very quickly became popular due to
the significantly superior speed and accuracy when compared to other architectures.
YOLOv1
YOLOv4
Released in 2018, Introducting of Mosaic data augmentation, and a new and
improved loss function - decreasing time taken to achieve better results for the
trained model.
YOLOv5
Released in 2020, Introducing support for Object Tracking - which allows following a
moving object, and Panoptic Segmentation, which allows identification of
overlapping objects, with accurate bounding boxes.
Ultralytics YOLOv8
The latest version of YOLO as of today. YOLOv8 is a state-of-the-art model that
builds upon the already very successful previous YOLO versions, introducing new
performance and flexibility features.
Full support for previous YOLO versions, making it incredibly convenient for existing
users of previous YOLO versions to take advantage of the new features.
Versions
Comparison
In general, YOLOv8 is superior to all of its predecessors.
While YOLOv5 is mostly underperforming when compared to the next versions, it is
important to note how incredibly minimal the delays are even on a version so
outdated now.
YOLO offers pretrained models that are used to start train custom models.
Each model has its advantages and disadvantages, and should be picked
depending on the project.
Size
mAP single-model single-scale values while detecting on the COCO val2017
dataset.
Speed
Averaged time taken using the Amazon EC2 P4d instance on the COCO dataset.
The pixel height and width the model operates up to.
Params (In Millions)
The number of parameters that are tweaked per epoch while training, and
processed during inference.
FLOPS
Floating Point Operations Per Second
A measure based on Floating Point Operations that is relevant in the field of Deep
Learning.
Diminishing results can be observed on the mAP values when compared to the time
taken (Speed).
Model properties
In some circumstances, max precision is essential, and is prioritised over the
hardware requirements. This is when a higher model should be chosen.
In the scope of this project - the YOLOv8m model has been chosen.
The morale behind this choice is to take the advantage of the high mAP value, while
not exceeding the time taken too much, in preparation for a future mobile
deployment of the model.
In comparison of YOLOv5 and YOLOv8 versions - a clear advantage can be seen
when taking into account the size of the model (param count), and the resulting mAP
output, as well as the time taken.
Architecture choice
YOLO has been chosen as the architecture that this project utilises for the AI
detection.
At the start of the project, there was already a high bias towards YOLO due to the
highly positive past experience with YOLOv5 and all the incredible features that it
offers.
Upon release of YOLOv8 and all the superior features and specifications that it
provides on top of the previous versions - the YOLO family was an obvious choice in
the architecture that will be used for the project.
As the name suggests, YOLO focuses on detection of multiple classes in a single
"look", which is a single analysis of the entire input image.
When compared to many other architectures before YOLO, realistically, no matter
how quick the other architectures may be - this is an incredibly superior approach,
as other architectures would approach detection by reanalysing the entire image for
every single class that the model was trained for - increasing the time taken per
detection additively per class.
An approach like this may seem too good to be true, and that it should come with
signficant cost to the speed and confidence of the model.
But when the results are analysed - that could barely be further from the truth.
YOLO is an incredibly efficient and accurate architecture.
These days most sophisticated architectures approach object detection similarly to
YOLO, but YOLO is still a state-of-the-art architecture that continues to improve and
grow to this day.
Internal AI Object Detection steps
Classification
Object Detection
Segmentation
The process of identifying the exact bounding box of the item detected.
The Bounding by a box of the classified segments of the image.
The identification of a part of an image believe to contain an item of a class the
model was trained to detect.
Visual examples
Resizing
Joining up of multiple images to create new ones
The reduction of data required to train makes it feasible to train relatively high quality
models from data gathered and trained from home.
Marking Codes
Hardware
Raspberry Pi 4
Beaglebone
Nvidia Jetson Nano
Intel Neural Compute Stick 2
Specifications
Processor Base Frequency
700MHz
Memory
2GB
Specifications
Core Count (SHAVE)
16
Advantage
Offers computational power through a USB connection - can be used to run
Inference on existing devices, such as a laptop.
Specifications
Specifications
Core Count
4
Maximum Frequency
700MHz
Resistors and Inductors
Capacitors
ICs
Color coded
Number coded
Android Phone
Specifications depend on the specific device.
Widely and easily accessible.
The vast majority of mobile phones on the market today have a built-in camera.
YOLO
You Only Look Once
An image detection architecture that the project is based on.
CUDA cores provided by the GPU
CPU
Inference
Training
Personal Computer
Rented Dedicated Server
Advantages
Disadvantages
Advantages
Disadvantages
Local - Provided a local machine is already owned, it is immediately available.
Utilises multiple GPUs - Quicker epoch computations, resulting in quicker training.
Cloud based
Allows for parallel computing, as opposed to using your personal computer at home.
Cloud based - upload and download times
Datasets tend to be considerably big in size.
A smaller dataset of ~2000 images takes up ~3gb of space.
This is not a significant amount of data for a local machine to transfer, but it is a
considerable amount for uploading.
Cost
The bigger the server - the higher the rates become.
Cost
As opposed to a rented server - acquiring your own machine has the benefit of
owning the machine, and being able to use it indefinitely (Or until it eventually
breaks.)
While the initial cost of acquiring an adequate machine for deep learning is higher
than renting a server for a few months, it is a worthwhile long-term investment into a
machine that can be used for a variety of casual or intensive tasks.
Setup time
Setup time
Speed
Speed
When compared to a sophisticated server that runs many GPUs - a local machine
will most likely process the training at a slower rate than a dedicated server would.
A local machine will likely contain one, maybe two GPUs.
Pictures are taken from the machine itself. No upload/download times.
Devices
Discussion
After the training is done, which is usually over the span of 10's, and sometimes
100's of hours, depending on the size of the dataset and the epoch count - Running
the trained model for inference only takes time in the range of milliseconds to
process a single frame.
R-CNN
Description
Disadvantages
Not real-time.
On average, takes 47 seconds to process a single frame.
Discussion
It should be noted that R-CNN has a successor called Fast R-CNN and Faster R-
CNN.
However, even the fastest of the successors still barely manage 5 frames a second
at best.
R-CNN, which stands for Region Based Convolutional Neural Networks was
released in 2013. As other object detection architectures, R-CNN takes an input
image, and outlines bounding boxes where it believes an item of a certain class is
present.
While 5 frames a second is an impressive and definitely useable result, there are
alternative architectures that offer a significant improvement in inference time.
Developed by Ross Girshick
SSD
Description
SSD, which stands for Single Shot Detector. SSD was released in 2017
Developed mostly by Max deGroot and Ellis Brown
Discussion
Offers great framerates of an average of 45 frames per second when tested on a
relatively old now graphics card, the NVIDIA GTX 1060.
Disadvantages
According to the Git repository, the project was seemingly abandoned about 4 years
ago.
According to the Git repository, the project was seemingly abandoned about 5 years
ago.
Discussion
The component of a computer where the core computations are processed.
An optional component of a computer that is dedicated and optimised in computing
graphical tasks.
Existing labelling related software offers quality of life features, such as rough auto
labelling of the images, which only requires the user to adjust the bounding boxes
and confirm their validity, rather than having to define the boxes from start to finish.
Inference Example
Inference Example
Discussion
Surprisingly good results for a model trained from 120 images, with confidence
values above 0.8 and sometimes over 0.9!
Pretrained model used
yolov5s
Architecture
YOLOv5
Architecture
Pretrained model used
yolov5m
YOLOv5
Architecture
Pretrained model used
yolov5m
YOLOv5
Discussion
Rather poor results. Confidence values usually below 0.7, struggled to classify
accurately.
Discussion
Great results with confidence values consistently above 0.8, classifying all classes
accurately!
Technology utilised
Deep learning computation with CPU Cores and GPU CUDA Cores running in
parallel.
220Ohm resistor example
Color codes
Red = 2
Brown = 1
Gold = 5% tolerance
100nF capacitor example
Unfortunately for the purposes of automatic identification of Integrated Circuit
markings, most IC manufacturers do not follow any global standard for marking their
ICs.
Most manufacturers tend to have their own internal IC marking standards.
Due to this fact - only known markings can be used to identify components.
Mixed manufacturer ICs example
This example illustrates the vast variation and lack of identifiable without access to
datasheets markings.
YOLOv5
Architecture
A silicon board that has parts of it etched away, with only conductive tracks
remaining in specific positions that are pre-planned using a CAD software.
Widely used to implement electronic circuits.
CAD
Computer Aided Design
CAD software accelerates and automates designs in various different fields.
Instructions are given to a computer that are translated into more complex and
intuitive, usually GUI based interactive programs.
Electrical current that oscillates.
Electrical current that stays constant.
An electrical component that emits light when current is passed through the circuit.
A resistor that varies in resistance relatively to the amount of light the body of the
component is exposed to.
A ring light has been introduced for both training and inference running.
Progress
Issues encountered
A glitch in augmentation provided by YOLOv5, where rotation during augmentation
has shifted the bounding boxes of the components, causing inaccurate feedback to
the model, preventing it from training appropriately.
Actual bounding boxes after rotation augmentation
Note the unnecessarily expanded bounding boxes.
Description
Submitted GitHub issue
Link
https://github.com/ultralytics/yolov5/issues/10639
Information gathered from replies as of todays date
This issue has been reported to be part of YOLOv7 augmentation also.
Example
Expected bounding boxes after rotation augmentation
Note the snug fit of the bounding box around the edges of the component.
That is desirable, as it provides accurate information on what the model should be
looking for.
This will train the model in undesirable ways, detecting parts it should not.
Augmentation rotation issue
Software
Description
Augmentation
Training
Description
Training versus Evaluation
Labels
Windows/Linux/Mac Desktop/Laptop Machine
Discussion
Discussion
Specifications depend on the specific device.
The specs of a desktop/laptop machine will most likely beat the specs of both a
phone, and a microprocessor.
Desktops are widely accessible in environments where it would be relevant to use
this project, such as the home of the user, or the campus a student is in.
Ease of access
Ease of access
Most people own a mobile device, and have it on them in most cases.
Ease of access
Due to the device being specialised for neural computations, it is not a common
device by any means.
Combined with the price tag of ~100 eur, this device will likely only be owned by
developers, as opposed to users.
As this device is unlikely to be owned by a user of the project, it would not be wise to
require owning one to run our inference.
Discussion
The project will be able to support a compute stick as an alternative to a GPU.
The machine must have permissions for USB connections and running the
application.
The app may be obtained from an App Store, that mobile devices have easy access
to, as long as they have access to the internet.
Specifications
Specifications
There are countless types of android devices on the market, all with varying
specifications.
Camera
Platforms
The application is designed through Qt Creator.
Qt Creator is cross-platform.
Cross-Platform
Microprocessors
USB computation extensions
How long specifically is directly tied to the speed of hardware that the model is being
ran on, and the size of the model.
Even with all the speed optimisations offered by the YOLO family, a lower end
device such as a Raspberry Pi 4 may take 1-2 seconds to process a single 360p
image.
It is important to pick appropriate hardware for your particular use cases.
CPU
Core Count
4
GPU
Maximum Frequency
1.5GHz
CPU
Core Count
1
Maximum Frequency
1GHz
Core Count
Maximum Frequency
GPU
2
532MHz
CPU
GPU
Core Count
4
Maximum Frequency
1.479GHz
Core Count
Maximum Frequency
128
921MHz
Discussion
This microprocessor is targeted towards quick graphical computations, which can
instead be used for deep learning.
Discussion
Conclusion
Despite the additional strains, the project is currently on-track and is following the
planned milestones in sequence.
The original project concept at the time of project proposition had a slight shift in
focus.
Existing Solutions
The Problem
Importance of object detection
Algorithmic object detection
Detection of objects from an image
There are various cases in which automation of object detection as opposed to
having a human constantly observing footage is beneficial.
Quality control
Security
Analysis
AI based Object Detection
Production
In the field of security through digital cameras, object detection is an incredibly
useful tool for monitoring potential intruders onto a facility.
For facilities that utilise dozens, or even hundreds of cameras - object detection is a
very valuable tool that requires minimal human interaction, with high level of
certainty, and 24/7 attention.
Depending on the security requirements, lower security facilities may not require
hiring a person for constant monitoring of the security footage, and offers live
notifications for any unexpected activity observed.
During production of anything from farm produce, to electronic components -
consistent detection and rejection of items with damage is essential.
With use of object detection, flaws can be recognised incredibly efficiently, and this
information may be passed onto the production line, identifying exactly which item
was detected to have flaws, and be discarded automatically, without ever requiring
any human interaction.
If the detection is sophisiticated enough to be more reliable than a person, this
opens up new opportunities for the speed and efficiency of the production, as a
computer's computational power may be expanded, unlike a person.
Thorough inspection of potential objects on final products is a crucial part of many
fields of production.
Features that may've developed during the process of manufacture may be detected
on the final products.
This includes positive, negative, or purely analytical features.
Examples
Developments of a petri dish colony
PCB manufacturing error
Potential signs of disease on farms
Material production flaws
Algorithm based detection can be used to effectively identify very specific criteria,
which can be expressed as an analytical value, or a trend.
Specific color based properties.
Specific shapes by following line trends.
Must be coherent shapes, does not perform well with partial shapes.
Trees, cats, dogs, people, cars, etc.
Specific patterns.
Training using Deep Learning
Epoch
Inference
A complex combination of usually many millions of digital neurons, with analog
based values for each neuron, which result in a form of decision making based on a
massive combination of criteria, rather than individual pixels.
The process of tweaking the entire model based on the existing parameters and the
current output that it produces - by use of a loss function, randomness, and clever
technique, in hopes of improving the detection ability of the data the model was
trained upon.
These neurons work together to identify the incoming information and produce an
output that resembles what the network has learned during the training of the model.
Training is usually based off of a pre-trained model, that is trained on a big dataset.
Most pre-trained models are trained on the COCO dataset, which is publicly
available, and holds a vast amount of data. This kickstarts the model with data that it
can repurpose to use as a base.
Initially, the model parameters are set to a random state.
The best performing model is kept between the newly trained model, and the
previous best.
Many epochs are ran in order to polish the model as much as possible.
The output that it produces when fed input data, is of course, also random.
Epochs are ran on the model to train the model.
In object detection, inference is the utilisation of the model to process classifications
of objects on the image.
Classification
The process of using the steps provided in the model to identify objects from an
input image, through steps that depend on the architecture used.
The identified objects are marked with a bounding box, class they belong to, and
confidence in the prediction value.
Because of how far ahead the YOLO architecture is when compared to most other
architectures, is utilised very commonly throughout any object detection projects.
Internal steps of the You Only Look Once Inference
The input image.
The input image is split into a S by S grid, S being 7 in this example.
Each cell predicts the bounding boxes, and confidence values of each box.
These steps are repeated for each of the grid cells, until everything is identified.
All the identified bounding boxes.
And all the probablities for each box.
Each of the bounding boxes are checked for how much they cover of each
probability cell, and are "shaded" in the case of this example with those probabilities.
Finally, the bounding boxes are reduced using thresholds detections and NMS.
NMS
Non-maximum Suppression
A filtering technique used on predictions of object detectors.
Picks the smallest intersection of bounding boxes, according to their confidence
values.
REFERENCE https://web.cs.ucdavis.edu/~yjlee/teaching/ecs289g-winter2018/
YOLO.pdf
This is the final output that YOLO provides: Bounding boxes with classification (In
this example, the classification is marked by a color. Th real output is a string.), and
the confidence value (In this example, confidence is marked by the opacity of the
boxes. The real output is a number ranging from 0 to 1)
The project was intended to focus on a field that is more familiar, which was more on
the theme of a high precision tool, which requires a set rig to push beyond the
limitations to produce high quality results.
This has eversince shifted more towards a more flexible, but less precise concept
with the consideration of more generic use through mobile deployment.
As mobile deployment of object detection was not explored, this has put additional
strain on the timeline.
Great results in object detection have now been achieved through the set rig,
however, that is not quite the case outside of a set rig.
The model has been train on on over 3000 images now that were taken and labeled
over the course of several months.
At this rate, a significant increase in the dataset would be required for adequate
results outside of a set rig, which would require a significant amount of additional
time investment.
This is of course is not a very viable option without neglecting other parts of the
project.
Transition from YOLOv5 to YOLOv8 is currently being considered.
YOLOv8 as of right now only has experimental versions of some features that are
essential to the successs of this project, which are currently being tested to see if
they will be adequate enough to upgrade the current architecture.
Overall, YOLOv8 has shown a considerable increase in confidence values.
After the achieval of great confidence value results from the latest model, the next
milestone is the image post-processing.
By B00125142 Violet Concordia
Supervised by Benjamin Toland
Introduction of random rotation, with respect to the label position. Specified in a 0 to
360° range.
Introduction of random spots of blur. Specified in frequency and intensity.
Introduction of random scaling of the image, at a specified frequency and range of
scaling.
Simulates real life scenario of a different angle the feed is provided in.
Simulates real life scenario of change in focus, fog, and smudges on the camera
lens.
Simulates real life scenario of distance. Upclose objects will cover a far bigger pixel
area in an image than a far away one would, and should be trained against this to
prevent detection of only certain distance away objects.
Cutting and joining of images into new images.
Creates additional images from the existing dataset that appear unique, making
most of the existing dataset, with only a slight devalue in information stored in
regards to training.
Simulates bright objects interacting with the lens, ensuring the model does not get
confused about glare in real life scenarios.
Introduction of random glares, of specified frequency and intensity.
Introduction of random spots and smudges.
Simulates dust, dirt, and other particles that may be present in a real life scenario,
ensuring the model does not get confused by a partial coverage of an object.
Abstract
Acknowledgement
This report illustrates the development this 4th year Computer Engineering Project.
I am very grateful to Technological University Dublin for accepting me as a student,
and providing me with the opportunity to take on this project.
Declaration
The material contained in this assignment is the author’s original work, except where
work quoted is duly acknowledged in the text. No aspect of this assignment has
been previously submitted in any other unit or course.
IDE
Integrated Development Environment
A sophisticated text editor, designed specifically for code development in certain
languages.
Most IDEs today come with a high range of optional plugins that can be used to
further increase production speed, and reduce redundant tasks via automation of
said tasks.
Several of same type of component, all packaged in a single line.
Example, an SIP resistor, which features multiple resistors, all connected to a single
ground pin.
A term often used in deep learning, which is the process of attempting to simulate
intelligence that is similar to that of a human, by the use of a computer, in order to
tackle issues that a standard style of computer operation either fails to, or performs
at incredibly slow rates.
Course
TU807
Year
4th
Title
Bachelor of Engineering (Honours) in Computer Engineering in Mobile Systems
Code
The report hopes to successfully portray an accurate thought process and workflow
of this project, in full detail, while giving credit where credit is due, and thoroughly
explaining the choices made during the development of this project.
The report has been organised into sections that each cover a wide topic, with table
of contents to aid in smooth navigation of the contents.
The project aims to develop an everyday utility tool for those that work with
computers often, and hopes to improve their workflow, and ease the stress on their
fingers, while providing full control of the experience to the user, by offering to the
user to reprogram their device, using an intuitive GUI programmer.
I’d like to acknowledge and express my gratitude towards my project supervisor,
Benjamin Toland – who has taken on me and my custom project and has provided
excellent guidance and feedback throughout the development of this project.
I am also very grateful for the existence and availability of search engines, the
primarily used for this project being Google. It is a valuable resource for research,
even though you should not take every search result at face value, and ensure at
least a few trusted sources agree with the findings.
I would also like to thank my wonderful old and new friends and peers that I was
able to meet thanks to my ability to attend TUD Blanchardstown Campus, and the
quiet spaces provided for us to further our education with minimal interruptions.
I am also grateful to my brother, Justas Bartnykas – who is also an engineer. I am
grateful to him for introducing me to Qt Creator, which is now my go-to IDE for
developement of C++ code for the past more than 5 years. I have also been
provided access to a fairly expensive camera that he owned, that allowed the project
to begin sooner than it otherwise could've.
I’d like to offer my sincere apology and thanks to anyone else I may have missed
that has contributed to this project in any way.
When we look at an image, we can immediately discern objects that are displayed,
without ever having to think about it. It is effortless.
We use a combination of past experiences and deduction to determine information
almost immediately.
This ability of ours as humans is thanks to millions of years of evolution, in a world
where not detecting a potential threat makes a difference between life and death. It
is in our subconscious nature to detect objects within a moment's notice.
The problem arises when we want to implement a computer to process object
detection for us.
A computer is built upon the most simple of arithmetic tasks, combined together from
an incredible amount of transistors that perform these tasks, into massive systems -
which operate entirely in digital binary.
A computer has no concept of learning from past experiences, nor any feelings that
may affect its decisions. Given the same instructions, it will produce the same
results on its first day of manufacture, and the last day of its operation (provided the
unit was not damaged in a way that it would yield unpredictable results).
Computers are designed to process computational tasks at complete precision and
consistency.
When a computer is exposed to an image in form of a pixel grid, all it truly sees is
numerical values that are assigned to each cell of the grid.
Computers have no concept of color, let alone real life objects.
The simple action of either moving or scaling an object completely changes the
arrangement of the pixels that represent this object.
To a person - this is no issue. It is obvious that this is still the same object. However,
to a computer - the data has just shifted around entirely.
Examples
A close up of a 220ohm resistor
An even further close up of the first band, where each pixel is beginning to show
The same band, rotated at an arbitrary angle
The width and height of the image have both changed due to this rotation.
The order of the pixels has changed drastically.
However, it will struggle heavily at detecting and identifying any generic object.
Examples
Examples
While this is a complex issue to tackle already, it only becomes more complex when
we want to detect something generic, such as a tree.
A tree has many properties that may differ, such as: presence, color, shape, and size
of leaves/needles, thickness, size, and color of trunk, branching styles, etc. While
still being effortlessly identifiable by an intelligent creature.
The dataset is separeted into two categories - one on which the model is being
adjusted on, and another on which the model is being ran to determine the new
confidence values achieved by the alterations.
Both datasets must be labeled for fully automated evaluation.
Labeling is an essential part of training for Object Detection.
Labels are bounding boxes that determine what object is present where in the input
dataset, in order for the model to do its best to detect them.
Neural Networks
The networks are trained through deep learning.
Neural networks are an imitation of biological creatures intelligence, which is known
as Artificial Intelligence. AI is designed to be ran on a computer.
Description
Has the ability to be trained, which simulates past experience of biological
intelligence.
In the scope of this project, somewhere between 250 and 500 epochs will suffice.
The boost in results is limited to the set rig.
Created With
EdrawMind